首页> 外文OA文献 >Learning to combine multiple string similarity metrics for effective toponym matching
【2h】

Learning to combine multiple string similarity metrics for effective toponym matching

机译:学习组合多个字符串相似性指标以进行有效的地名匹配

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching, that is, the problem of matching place names that share a common referent. In this article, we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task. We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics, which has the natural advantage of avoiding the manual tuning of similarity thresholds. Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small, and that carefully tuning the similarity threshold is important for achieving good results. The methods based on supervised machine learning, particularly when considering ensembles of decision trees, can achieve good results on this task, significantly outperforming the individual similarity metrics.
机译:与地理信息检索和地理信息科学有关的若干任务涉及地名匹配,即匹配共享共同指称的地名的问题。在本文中,我们介绍了对地名匹配任务中不同字符串相似性度量的性能进行广泛评估的结果。我们还报告了涉及使用监督机器学习来组合多个相似性指标的实验,这具有避免手动调整相似性阈值的自然优势。使用非常大的数据集进行的实验表明,各个相似度指标的性能差异相对较小,并且仔细调整相似度阈值对于获得良好结果非常重要。基于监督机器学习的方法,特别是在考虑决策树集成时,可以在此任务上取得良好的结果,大大优于单个相似性指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号